High-Level Syntax Designs for Point Cloud Coding

ABSTRACT

A method of point cloud coding (PCC) implemented by a video decoder. The method includes receiving an encoded bitstream including a unit header, the unit header containing a type indicator specifying a type of content carried in a payload, and decoding the encoded bitstream.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/US2019/027064 filed on Apr. 11, 2019, by Futurewei Technologies,Inc., and titled “High-Level Syntax Designs for Point Cloud Coding,”which claims the benefit of U.S. Provisional Patent Application No.62/690,132, filed Jun. 26, 2018, by Ye-Kui Wang, and titled “High-LevelSyntax Designs for Point Cloud Coding,” each of which is herebyincorporated by reference.

TECHNICAL FIELD

The present disclosure is generally related to point cloud coding, andis specifically related to the high-level syntax for point cloud coding.

BACKGROUND

The point cloud is employed in a wide variety of applications includingentertainment industry, intelligent automobile navigation, geospatialinspection, three dimensional (3D) modeling of real world objects,visualization etc. Considering the non-uniform sampling geometry of thepoint cloud, compact representations for storage and transmission ofsuch data is useful. Compared with the other 3D presentations, theirregular point cloud is more general and applicable for a wider rangeof sensors and data acquisition strategies. For example, when performinga 3D presentation in a virtual reality world or remote renderings in atele-presence environment, the rendering of virtual figures andreal-time instructions are processed as dense point cloud data set.

SUMMARY

A first aspect relates to a method of point cloud coding (PCC)implemented by a video decoder. The method includes receiving an encodedbitstream including a unit header and a data unit, the unit headercontaining a type indicator specifying a type of content carried in apayload of the data unit; and decoding the encoded bitstream.

A second aspect relates to a method of point cloud coding (PCC)implemented by a video encoder. The method includes generating anencoded bitstream including a unit header, the unit header containing atype indicator specifying a type of content carried in a payload; andtransmitting the encoded bitstream toward a decoder.

The methods provide high-level syntax designs that solve one or more ofthe problems associated with point cloud coding as described below.Therefore, the process of video coding and the video codec are improved,more efficient, and so on.

In a first implementation form of the method according to the first orsecond aspect as such, the unit header is a PCC network abstractionlayer (NAL) unit header.

In a second implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the encoded bitstream further comprises a data unit,and wherein the data unit is a PCC NAL unit.

In a third implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator specifies that the type of contentis a geometry component.

In a fourth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator specifies that the type of contentis a texture component.

In a fifth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator specifies that the type of contentis a geometry component or a texture component.

In a sixth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator specifies that the type of contentis auxiliary information.

In a seventh implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator specifies that the type of contentis an occupancy map.

In an eighth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the payload comprises a High Efficiency Video Coding(HEVC) NAL unit.

In a ninth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the payload comprises an Advanced Video Coding (AVC)NAL unit.

In a tenth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the type indicator comprises five bits.

In an eleventh implementation form of the method according to the firstor second aspect as such or any preceding implementation form of thefirst or second aspect, the type indicator consists of seven bits.

In a twelfth implementation form of the method according to the first orsecond aspect as such or any preceding implementation form of the firstor second aspect, the geometry component comprises a set of coordinatesassociated with a point cloud frame.

In a thirteenth implementation form of the method according to the firstor second aspect as such or any preceding implementation form of thefirst or second aspect, the set of coordinates are Cartesiancoordinates.

In a fourteenth implementation form of the method according to the firstor second aspect as such or any preceding implementation form of thefirst or second aspect, the texture component comprises a set of lumasample values of a point cloud frame.

A third aspect relates to a coding apparatus that includes a receiverconfigured to receive a picture to encode or to receive a bitstream todecode, a transmitter coupled to the receiver, the transmitterconfigured to transmit the bitstream to a decoder or to transmit adecoded image to a display, a memory coupled to at least one of thereceiver or the transmitter, the memory configured to storeinstructions, and a processor coupled to the memory, the processorconfigured to execute the instructions stored in the memory to performthe method of any of the preceding aspects or implementations.

The coding apparatus utilizes high-level syntax designs that solve oneor more of the problems associated with point cloud coding as describedbelow. Therefore, the process of video coding and the video codec areimproved, more efficient, and so on.

In a first implementation form of the apparatus according to the thirdaspect as such, the apparatus further includes a display configured todisplay an image.

A fourth aspect relates to a system that includes an encoder and adecoder in communication with the encoder. The encoder or the decoderincludes the coding apparatus of any of the preceding aspects orimplementations.

The system utilizes high-level syntax designs that solve one or more ofthe problems associated with point cloud coding as described below.Therefore, the process of video coding and the video codec are improved,more efficient, and so on.

A fifth aspect relates to a means for coding that includes receivingmeans configured to receive a picture to encode or to receive abitstream to decode, transmission means coupled to the receiving means,the transmission means configured to transmit the bitstream to a decoderor to transmit a decoded image to a display means, storage means coupledto at least one of the receiving means or the transmission means, thestorage means configured to store instructions, and processing meanscoupled to the storage means, the processing means configured to executethe instructions stored in the storage means to perform the methods inany of the preceding aspects or implementations.

The means for coding utilizes high-level syntax designs that solve oneor more of the problems associated with point cloud coding as describedbelow. Therefore, the process of video coding and the video codec areimproved, more efficient, and so on.

For the purpose of clarity, any one of the foregoing embodiments may becombined with any one or more of the other foregoing embodiments tocreate a new embodiment within the scope of the present disclosure.

These and other features will be more clearly understood from thefollowing detailed description taken in conjunction with theaccompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this disclosure, reference is nowmade to the following brief description, taken in connection with theaccompanying drawings and detailed description, wherein like referencenumerals represent like parts.

FIG. 1 is a block diagram illustrating an example coding system that mayutilize context modeling techniques.

FIG. 2 a block diagram illustrating an example video encoder that mayimplement context modeling techniques.

FIG. 3 a block diagram illustrating an example video decoder that mayimplement context modeling techniques.

FIG. 4 is a schematic diagram of an embodiment of a data structurecompatible with PCC.

FIG. 5 is an embodiment of a method of point cloud coding implemented bya video decoder.

FIG. 6 is an embodiment of a method of point cloud coding implemented bya video encoder.

FIG. 7 is a schematic diagram of an example video coding device.

FIG. 8 is a schematic diagram of an embodiment of a means for coding.

DETAILED DESCRIPTION

It should be understood at the outset that although an illustrativeimplementation of one or more embodiments are provided below, thedisclosed systems and/or methods may be implemented using any number oftechniques, whether currently known or in existence. The disclosureshould in no way be limited to the illustrative implementations,drawings, and techniques illustrated below, including the exemplarydesigns and implementations illustrated and described herein, but may bemodified within the scope of the appended claims along with their fullscope of equivalents.

Video coding standards include International Telecommunications UnionTelecommunication Standardization Sector (ITU-T) H.261, InternationalOrganization for Standardization (ISO)/International ElectrotechnicalCommission (IEC) Moving Picture Experts Group (MPEG)-1 Part 2, ITU-TH.262 or ISO/IEC MPEG-2 Part 2, ITU-T H.263, ISO/IEC MPEG-4 Part 2,Advanced Video Coding (AVC), also known as ITU-T H.264 or ISO/IEC MPEG-4Part 10, and High Efficiency Video Coding (HEVC), also known as ITU-TH.265 or MPEG-H Part 2. AVC includes extensions such as Scalable VideoCoding (SVC), Multiview Video Coding (MVC), and Multiview Video Codingplus Depth (MVC+D), and 3D AVC (3D-AVC). HEVC includes extensions suchas Scalable HEVC (SHVC), Multiview HEVC (MV-HEVC), and 3D HEVC(3D-HEVC).

A point cloud is a set of data points in the 3D space. Each data pointconsists of parameters that determine a position (e.g., X, Y, Z), acolor (e.g., R, G, B or Y, U, V), and possibly other properties liketransparency, reflectance, time of acquisition, etc. Typically, eachpoint in a cloud has the same number of attributes attached to it. Pointclouds may be used in various applications such as real-time 3Dimmersive telepresence, content virtual reality (VR) viewing withinteractive parallax, 3D free viewpoint sports replay broadcasting,geographic information systems, cultural heritage, autonomous navigationbased on large-scale 3D dynamic maps, and automotive applications.

The ISO/IEC Moving Picture Experts Group (MPEG) began in 2016 thedevelopment of a new codec standard on Point Cloud Coding for losslessand lossy compressed point cloud data with substantial coding efficiencyand robustness to network environments. The use of this codec standardallows point clouds to be manipulated as a form of computer data and tobe stored on various storage media, transmitted and received overexisting and future networks and distributed on existing and futurebroadcasting channels.

Recently, the point cloud coding (PCC) work was classified into threecategories, PCC category 1, PCC category 2, and PCC category 3, whereintwo separate working drafts were being developed, one for PCC category 2(PCC Cat2), and the other for PCC categories 1 and 3(PCC Cat13). Thelatest working draft (WD) for PCC Cat2 is included in MPEG outputdocument N17534, and the latest WD for PCC Cat13 is included in MPEGoutput document N17533.

The main philosophy behind the design of the PCC Cat2 codec in the PCCCat2 WD is to leverage existing video codecs to compress the geometryand texture information of a dynamic point cloud, by compressing thepoint cloud data as a set of different video sequences. In particular,two video sequences, one representing the geometry information of thepoint cloud data and another representing the texture information, aregenerated and compressed by using video codecs. Additional metadata tointerpret the two video sequences, i.e., an occupancy map and auxiliarypatch information, is also generated and compressed separately.

Unfortunately, the existing designs of PCC have drawbacks. For example,data units pertaining to one time instance, i.e., one access unit (AU),are not contiguous in decoding order. In the PCC Cat 2 WD, the dataunits of texture, geometry, auxiliary information, and the occupancy mapfor each AU are interleaved in the units of group of frames. That is,the geometry data for all the frames in the group is together. The sameis often true for texture data, and so on. In PCC Cat 13 WD, the dataunits of geometry and the general attributes for each AU are interleavedon the level of the entire PCC bitstream (e.g., the same as in PCC Cat2WD when there is only one group of frames that has the same length asthe entire PCC bitstream). Interleaving of data units belonging to oneAU inherently causes a huge end-to-end delay that is at least equal tothe length of the group of frames in presentation time duration inapplication systems.

Another drawback relates to the bitstream format. The bitstream formatallows emulation of a start code pattern like 0×0003 and therefore doesnot work for transmission over MPEG-2 transport stream (TS) where startcode emulation prevention is needed. For PCC Cat2, currently onlygroup_of_frames_geometry_video_payload( ) and group_offrames_texture_video_payload( ) have start code emulation prevention inplace when either HEVC or AVC is used for coding of the geometry andtexture components. For PCC Cat13, start code emulation prevention isnot in place anywhere in the bitstream.

In PCC Cat2 WD, some of the codec information (e.g., which codec,profile, level, etc., of the codec) for the geometry and texturebitstreams is deeply buried in the multiple instances of the structuresgroup_of frames_geometry_video_payload( ) andgroup_of_frames_texture_video_payload( ). Furthermore, some of theinformation like profile and level that indicates the capabilities fordecoding of the auxiliary information and occupancy map components, aswell as for point cloud reconstruction, is missing.

Disclosed herein are high-level syntax designs that solve one or more ofthe aforementioned problems associated with point cloud coding. As willbe more fully explained below, the present disclosure utilizes a typeindicator in a data unit header (a. k. a., a PCC network access layer(NAL) header or simply a unit header) to specify the type of content inthe payload of the PCC NAL unit. In addition, the present disclosureutilizes a group of frames header NAL unit to carry the group of framesheader parameters. The group of frames header NAL unit may also be usedto signal the profile and level of each geometry or texture bitstream.

FIG. 1 is a block diagram illustrating an example coding system 10 thatmay utilize PCC video coding techniques. As shown in FIG. 1, the codingsystem 10 includes a source device 12 that provides encoded video datato be decoded at a later time by a destination device 14. In particular,the source device 12 may provide the video data to destination device 14via a computer-readable medium 16. Source device 12 and destinationdevice 14 may comprise any of a wide range of devices, including desktopcomputers, notebook (e.g., laptop) computers, tablet computers, set-topboxes, telephone handsets such as so-called “smart” phones, so-called“smart” pads, televisions, cameras, display devices, digital mediaplayers, video gaming consoles, video streaming device, or the like. Insome cases, source device 12 and destination device 14 may be equippedfor wireless communication

Destination device 14 may receive the encoded video data to be decodedvia computer-readable medium 16. Computer-readable medium 16 maycomprise any type of medium or device capable of moving the encodedvideo data from source device 12 to destination device 14. In oneexample, computer-readable medium 16 may comprise a communication mediumto enable source device 12 to transmit encoded video data directly todestination device 14 in real-time. The encoded video data may bemodulated according to a communication standard, such as a wirelesscommunication protocol, and transmitted to destination device 14. Thecommunication medium may comprise any wireless or wired communicationmedium, such as a radio frequency (RF) spectrum or one or more physicaltransmission lines. The communication medium may form part of apacket-based network, such as a local area network, a wide-area network,or a global network such as the Internet. The communication medium mayinclude routers, switches, base stations, or any other equipment thatmay be useful to facilitate communication from source device 12 todestination device 14.

In some examples, encoded data may be output from output interface 22 toa storage device. Similarly, encoded data may be accessed from thestorage device by input interface. The storage device may include any ofa variety of distributed or locally accessed data storage media such asa hard drive, Blu-ray discs, digital video disks (DVD)s, Compact DiscRead-Only Memories (CD-ROMs), flash memory, volatile or non-volatilememory, or any other suitable digital storage media for storing encodedvideo data. In a further example, the storage device may correspond to afile server or another intermediate storage device that may store theencoded video generated by source device 12. Destination device 14 mayaccess stored video data from the storage device via streaming ordownload. The file server may be any type of server capable of storingencoded video data and transmitting that encoded video data to thedestination device 14. Example file servers include a web server (e.g.,for a website), a file transfer protocol (FTP) server, network attachedstorage (NAS) devices, or a local disk drive. Destination device 14 mayaccess the encoded video data through any standard data connection,including an Internet connection. This may include a wireless channel(e.g., a Wi-Fi connection), a wired connection (e.g., digital subscriberline (DSL), cable modem, etc.), or a combination of both that issuitable for accessing encoded video data stored on a file server. Thetransmission of encoded video data from the storage device may be astreaming transmission, a download transmission, or a combinationthereof.

The techniques of this disclosure are not necessarily limited towireless applications or settings. The techniques may be applied tovideo coding in support of any of a variety of multimedia applications,such as over-the-air television broadcasts, cable televisiontransmissions, satellite television transmissions, Internet streamingvideo transmissions, such as dynamic adaptive streaming over HTTP(DASH), digital video that is encoded onto a data storage medium,decoding of digital video stored on a data storage medium, or otherapplications. In some examples, coding system 10 may be configured tosupport one-way or two-way video transmission to support applicationssuch as video streaming, video playback, video broadcasting, and/orvideo telephony.

In the example of FIG. 1, source device 12 includes video source 18,video encoder 20, and output interface 22. Destination device 14includes input interface 28, video decoder 30, and display device 32. Inaccordance with this disclosure, video encoder 20 of the source device12 and/or the video decoder 30 of the destination device 14 may beconfigured to apply the techniques for video coding. In other examples,a source device and a destination device may include other components orarrangements. For example, source device 12 may receive video data froman external video source, such as an external camera. Likewise,destination device 14 may interface with an external display device,rather than including an integrated display device.

The illustrated coding system 10 of FIG. 1 is merely one example.Techniques for video coding may be performed by any digital videoencoding and/or decoding device. Although the techniques of thisdisclosure generally are performed by a video coding device, thetechniques may also be performed by a video encoder/decoder, typicallyreferred to as a “CODEC.” Moreover, the techniques of this disclosuremay also be performed by a video preprocessor. The video encoder and/orthe decoder may be a graphics processing unit (GPU) or a similar device.

Source device 12 and destination device 14 are merely examples of suchcoding devices in which source device 12 generates coded video data fortransmission to destination device 14. In some examples, source device12 and destination device 14 may operate in a substantially symmetricalmanner such that each of the source and destination devices 12, 14includes video encoding and decoding components. Hence, coding system 10may support one-way or two-way video transmission between video devices12, 14, e.g., for video streaming, video playback, video broadcasting,or video telephony.

Video source 18 of source device 12 may include a video capture device,such as a video camera, a video archive containing previously capturedvideo, and/or a video feed interface to receive video from a videocontent provider. As a further alternative, video source 18 may generatecomputer graphics-based data as the source video, or a combination oflive video, archived video, and computer-generated video.

In some cases, when video source 18 is a video camera, source device 12and destination device 14 may form so-called camera phones or videophones. As mentioned above, however, the techniques described in thisdisclosure may be applicable to video coding in general, and may beapplied to wireless and/or wired applications. In each case, thecaptured, pre-captured, or computer-generated video may be encoded byvideo encoder 20. The encoded video information may then be output byoutput interface 22 onto a computer-readable medium 16.

Computer-readable medium 16 may include transient media, such as awireless broadcast or wired network transmission, or storage media (thatis, non-transitory storage media), such as a hard disk, flash drive,compact disc, digital video disc, Blu-ray disc, or othercomputer-readable media. In some examples, a network server (not shown)may receive encoded video data from source device 12 and provide theencoded video data to destination device 14, e.g., via networktransmission. Similarly, a computing device of a medium productionfacility, such as a disc stamping facility, may receive encoded videodata from source device 12 and produce a disc containing the encodedvideo data. Therefore, computer-readable medium 16 may be understood toinclude one or more computer-readable media of various forms, in variousexamples.

Input interface 28 of destination device 14 receives information fromcomputer-readable medium 16. The information of computer-readable medium16 may include syntax information defined by video encoder 20, which isalso used by video decoder 30, that includes syntax elements thatdescribe characteristics and/or processing of blocks and other codedunits, e.g., group of pictures (GOPs). Display device 32 displays thedecoded video data to a user, and may comprise any of a variety ofdisplay devices such as a cathode ray tube (CRT), a liquid crystaldisplay (LCD), a plasma display, an organic light emitting diode (OLED)display, or another type of display device.

Video encoder 20 and video decoder 30 may operate according to a videocoding standard, such as the High Efficiency Video Coding (HEVC)standard presently under development, and may conform to the HEVC TestModel (HM). Alternatively, video encoder 20 and video decoder 30 mayoperate according to other proprietary or industry standards, such asthe International Telecommunications Union TelecommunicationStandardization Sector (ITU-T) H.264 standard, alternatively referred toas Moving Picture Expert Group (MPEG)-4, Part 10, Advanced Video Coding(AVC), H.265/HEVC, or extensions of such standards. The techniques ofthis disclosure, however, are not limited to any particular codingstandard. Other examples of video coding standards include MPEG-2 andITU-T H.263. Although not shown in FIG. 1, in some aspects, videoencoder 20 and video decoder 30 may each be integrated with an audioencoder and decoder, and may include appropriatemultiplexer-demultiplexer (MUX-DEMUX) units, or other hardware andsoftware, to handle encoding of both audio and video in a common datastream or separate data streams. If applicable, MUX-DEMUX units mayconform to the ITU H.223 multiplexer protocol, or other protocols suchas the user datagram protocol (UDP).

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder circuitry, such as one or moremicroprocessors, digital signal processors (DSPs), application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs),discrete logic, software, hardware, firmware or any combinationsthereof. When the techniques are implemented partially in software, adevice may store instructions for the software in a suitable,non-transitory computer-readable medium and execute the instructions inhardware using one or more processors to perform the techniques of thisdisclosure. Each of video encoder 20 and video decoder 30 may beincluded in one or more encoders or decoders, either of which may beintegrated as part of a combined encoder/decoder (CODEC) in a respectivedevice. A device including video encoder 20 and/or video decoder 30 maycomprise an integrated circuit, a microprocessor, and/or a wirelesscommunication device, such as a cellular telephone.

FIG. 2 is a block diagram illustrating an example of video encoder 20that may implement video coding techniques. Video encoder 20 may performintra- and inter-coding of video blocks within video slices.Intra-coding relies on spatial prediction to reduce or remove spatialredundancy in video within a given video frame or picture. Inter-codingrelies on temporal prediction to reduce or remove temporal redundancy invideo within adjacent frames or pictures of a video sequence. Intra-mode(I mode) may refer to any of several spatial based coding modes.Inter-modes, such as uni-directional (a. k. a., uni prediction)prediction (P mode) or bi-prediction (a.k.a., bi prediction) (B mode),may refer to any of several temporal-based coding modes.

As shown in FIG. 2, video encoder 20 receives a current video blockwithin a video frame to be encoded. In the example of FIG. 2, videoencoder 20 includes mode select unit 40, reference frame memory 64,summer 50, transform processing unit 52, quantization unit 54, andentropy coding unit 56. Mode select unit 40, in turn, includes motioncompensation unit 44, motion estimation unit 42, intra-prediction(a.k.a., intra prediction) unit 46, and partition unit 48. For videoblock reconstruction, video encoder 20 also includes inversequantization unit 58, inverse transform unit 60, and summer 62. Adeblocking filter (not shown in FIG. 2) may also be included to filterblock boundaries to remove blockiness artifacts from reconstructedvideo. If desired, the deblocking filter would typically filter theoutput of summer 62. Additional filters (in loop or post loop) may alsobe used in addition to the deblocking filter. Such filters are not shownfor brevity, but if desired, may filter the output of summer 50 (as anin-loop filter).

During the encoding process, video encoder 20 receives a video frame orslice to be coded. The frame or slice may be divided into multiple videoblocks. Motion estimation unit 42 and motion compensation unit 44perform inter-predictive coding of the received video block relative toone or more blocks in one or more reference frames to provide temporalprediction. Intra-prediction unit 46 may alternatively performintra-predictive coding of the received video block relative to one ormore neighboring blocks in the same frame or slice as the block to becoded to provide spatial prediction. Video encoder 20 may performmultiple coding passes, e.g., to select an appropriate coding mode foreach block of video data.

Moreover, partition unit 48 may partition blocks of video data intosub-blocks, based on evaluation of previous partitioning schemes inprevious coding passes. For example, partition unit 48 may initiallypartition a frame or slice into largest coding units (LCUs), andpartition each of the LCUs into sub-coding units (sub-CUs) based onrate-distortion analysis (e.g., rate-distortion optimization). Modeselect unit 40 may further produce a quad-tree data structure indicativeof partitioning of a LCU into sub-CUs. Leaf-node CUs of the quad-treemay include one or more prediction units (PUs) and one or more transformunits (TUs).

The present disclosure uses the term “block” to refer to any of a CU,PU, or TU, in the context of HEVC, or similar data structures in thecontext of other standards (e.g., macroblocks and sub-blocks thereof inH.264/AVC). A CU includes a coding node, PUs, and TUs associated withthe coding node. A size of the CU corresponds to a size of the codingnode and is square in shape. The size of the CU may range from 8×8pixels up to the size of the treeblock with a maximum of 64×64 pixels orgreater. Each CU may contain one or more PUs and one or more TUs. Syntaxdata associated with a CU may describe, for example, partitioning of theCU into one or more PUs. Partitioning modes may differ between whetherthe CU is skip or direct mode encoded, intra-prediction mode encoded, orinter-prediction (a.k.a., inter prediction) mode encoded. PUs may bepartitioned to be non-square in shape. Syntax data associated with a CUmay also describe, for example, partitioning of the CU into one or moreTUs according to a quad-tree. A TU can be square or non-square (e.g.,rectangular) in shape.

Mode select unit 40 may select one of the coding modes, intra- orinter-, e.g., based on error results, and provides the resulting intra-or inter-coded block to summer 50 to generate residual block data and tosummer 62 to reconstruct the encoded block for use as a reference frame.Mode select unit 40 also provides syntax elements, such as motionvectors, intra-mode indicators, partition information, and other suchsyntax information, to entropy coding unit 56.

Motion estimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation, performed by motion estimation unit 42, is theprocess of generating motion vectors, which estimate motion for videoblocks. A motion vector, for example, may indicate the displacement of aPU of a video block within a current video frame or picture relative toa predictive block within a reference frame (or other coded unit)relative to the current block being coded within the current frame (orother coded unit). A predictive block is a block that is found toclosely match the block to be coded, in terms of pixel difference, whichmay be determined by sum of absolute difference (SAD), sum of squaredifference (SSD), or other difference metrics. In some examples, videoencoder 20 may calculate values for sub-integer pixel positions ofreference pictures stored in reference frame memory 64. For example,video encoder 20 may interpolate values of one-quarter pixel positions,one-eighth pixel positions, or other fractional pixel positions of thereference picture. Therefore, motion estimation unit 42 may perform amotion search relative to the full pixel positions and fractional pixelpositions and output a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in reference frame memory 64. Motionestimation unit 42 sends the calculated motion vector to entropyencoding unit 56 and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation unit 42. Again, motion estimationunit 42 and motion compensation unit 44 may be functionally integrated,in some examples. Upon receiving the motion vector for the PU of thecurrent video block, motion compensation unit 44 may locate thepredictive block to which the motion vector points in one of thereference picture lists. Summer 50 forms a residual video block bysubtracting pixel values of the predictive block from the pixel valuesof the current video block being coded, forming pixel difference values,as discussed below. In general, motion estimation unit 42 performsmotion estimation relative to luma components, and motion compensationunit 44 uses motion vectors calculated based on the luma components forboth chroma components and luma components. Mode select unit 40 may alsogenerate syntax elements associated with the video blocks and the videoslice for use by video decoder 30 in decoding the video blocks of thevideo slice.

Intra-prediction unit 46 may intra-predict a current block, as analternative to the inter-prediction performed by motion estimation unit42 and motion compensation unit 44, as described above. In particular,intra-prediction unit 46 may determine an intra-prediction mode to useto encode a current block. In some examples, intra-prediction unit 46may encode a current block using various intra-prediction modes, e.g.,during separate encoding passes, and intra-prediction unit 46 (or modeselect unit 40, in some examples) may select an appropriateintra-prediction mode to use from the tested modes.

For example, intra-prediction unit 46 may calculate rate-distortionvalues using a rate-distortion analysis for the various testedintra-prediction modes, and select the intra-prediction mode having thebest rate-distortion characteristics among the tested modes.Rate-distortion analysis generally determines an amount of distortion(or error) between an encoded block and an original, unencoded blockthat was encoded to produce the encoded block, as well as a bitrate(that is, a number of bits) used to produce the encoded block.Intra-prediction unit 46 may calculate ratios from the distortions andrates for the various encoded blocks to determine which intra-predictionmode exhibits the best rate-distortion value for the block.

In addition, intra-prediction unit 46 may be configured to code depthblocks of a depth map using a depth modeling mode (DMM). Mode selectunit 40 may determine whether an available DMM mode produces bettercoding results than an intra-prediction mode and the other DMM modes,e.g., using rate-distortion optimization (RDO). Data for a texture imagecorresponding to a depth map may be stored in reference frame memory 64.Motion estimation unit 42 and motion compensation unit 44 may also beconfigured to inter-predict depth blocks of a depth map.

After selecting an intra-prediction mode for a block (e.g., aconventional intra-prediction mode or one of the DMM modes),intra-prediction unit 46 may provide information indicative of theselected intra-prediction mode for the block to entropy coding unit 56.Entropy coding unit 56 may encode the information indicating theselected intra-prediction mode. Video encoder 20 may include in thetransmitted bitstream configuration data, which may include a pluralityof intra-prediction mode index tables and a plurality of modifiedintra-prediction mode index tables (also referred to as codeword mappingtables), definitions of encoding contexts for various blocks, andindications of a most probable intra-prediction mode, anintra-prediction mode index table, and a modified intra-prediction modeindex table to use for each of the contexts.

Video encoder 20 forms a residual video block by subtracting theprediction data from mode select unit 40 from the original video blockbeing coded. Summer 50 represents the component or components thatperform this subtraction operation.

Transform processing unit 52 applies a transform, such as a discretecosine transform (DCT) or a conceptually similar transform, to theresidual block, producing a video block comprising residual transformcoefficient values. Transform processing unit 52 may perform othertransforms which are conceptually similar to DCT. Wavelet transforms,integer transforms, sub-band transforms or other types of transformscould also be used.

Transform processing unit 52 applies the transform to the residualblock, producing a block of residual transform coefficients. Thetransform may convert the residual information from a pixel value domainto a transform domain, such as a frequency domain. Transform processingunit 52 may send the resulting transform coefficients to quantizationunit 54. Quantization unit 54 quantizes the transform coefficients tofurther reduce bit rate. The quantization process may reduce the bitdepth associated with some or all of the coefficients. The degree ofquantization may be modified by adjusting a quantization parameter. Insome examples, quantization unit 54 may then perform a scan of thematrix including the quantized transform coefficients. Alternatively,entropy encoding unit 56 may perform the scan.

Following quantization, entropy coding unit 56 entropy codes thequantized transform coefficients. For example, entropy coding unit 56may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy coding technique. In the caseof context-based entropy coding, context may be based on neighboringblocks. Following the entropy coding by entropy coding unit 56, theencoded bitstream may be transmitted to another device (e.g., videodecoder 30) or archived for later transmission or retrieval.

Inverse quantization unit 58 and inverse transform unit 60 apply inversequantization and inverse transformation, respectively, to reconstructthe residual block in the pixel domain, e.g., for later use as areference block. Motion compensation unit 44 may calculate a referenceblock by adding the residual block to a predictive block of one of theframes of reference frame memory 64. Motion compensation unit 44 mayalso apply one or more interpolation filters to the reconstructedresidual block to calculate sub-integer pixel values for use in motionestimation. Summer 62 adds the reconstructed residual block to themotion compensated prediction block produced by motion compensation unit44 to produce a reconstructed video block for storage in reference framememory 64. The reconstructed video block may be used by motionestimation unit 42 and motion compensation unit 44 as a reference blockto inter-code a block in a subsequent video frame.

FIG. 3 is a block diagram illustrating an example of video decoder 30that may implement video coding techniques. In the example of FIG. 3,video decoder 30 includes an entropy decoding unit 70, motioncompensation unit 72, intra-prediction unit 74, inverse quantizationunit 76, inverse transformation unit 78, reference frame memory 82, andsummer 80. Video decoder 30 may, in some examples, perform a decodingpass generally reciprocal to the encoding pass described with respect tovideo encoder 20 (FIG. 2). Motion compensation unit 72 may generateprediction data based on motion vectors received from entropy decodingunit 70, while intra-prediction unit 74 may generate prediction databased on intra-prediction mode indicators received from entropy decodingunit 70.

During the decoding process, video decoder 30 receives an encoded videobitstream that represents video blocks of an encoded video slice andassociated syntax elements from video encoder 20. Entropy decoding unit70 of the video decoder 30 entropy decodes the bitstream to generatequantized coefficients, motion vectors or intra-prediction modeindicators, and other syntax elements. Entropy decoding unit 70 forwardsthe motion vectors and other syntax elements to motion compensation unit72. Video decoder 30 may receive the syntax elements at the video slicelevel and/or the video block level.

When the video slice is coded as an intra-coded (I) slice,intra-prediction unit 74 may generate prediction data for a video blockof the current video slice based on a signaled intra-prediction mode anddata from previously decoded blocks of the current frame or picture.When the video frame is coded as an inter-coded (e.g., B, P, or GPB)slice, motion compensation unit 72 produces predictive blocks for avideo block of the current video slice based on the motion vectors andother syntax elements received from entropy decoding unit 70. Thepredictive blocks may be produced from one of the reference pictureswithin one of the reference picture lists. Video decoder 30 mayconstruct the reference frame lists, List 0 and List 1, using defaultconstruction techniques based on reference pictures stored in referenceframe memory 82.

Motion compensation unit 72 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 72 uses some of the received syntaxelements to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice, P slice, or GPB slice),construction information for one or more of the reference picture listsfor the slice, motion vectors for each inter-encoded video block of theslice, inter-prediction status for each inter-coded video block of theslice, and other information to decode the video blocks in the currentvideo slice.

Motion compensation unit 72 may also perform interpolation based oninterpolation filters. Motion compensation unit 72 may use interpolationfilters as used by video encoder 20 during encoding of the video blocksto calculate interpolated values for sub-integer pixels of referenceblocks. In this case, motion compensation unit 72 may determine theinterpolation filters used by video encoder 20 from the received syntaxelements and use the interpolation filters to produce predictive blocks.

Data for a texture image corresponding to a depth map may be stored inreference frame memory 82. Motion compensation unit 72 may also beconfigured to inter-predict depth blocks of a depth map.

Keeping the above in mind, some of the basic concepts of the presentdisclosure are discussed.

For PCC Cat2, to solve the first problem described above, the data unitspertaining to one time instance (e.g., one access unit) should be placedcontiguous in decoding order in a bitstream. Once they are placedcontiguous in decoding order in the bitstream, identification of thetype of each data unit allows for identification routing each data unitto the correct decoder component. The design should also avoid violatingthe main design behind the PCC Cat2 codec, which is to leverage existingvideo codecs to compress the geometry and texture information of adynamic point cloud.

To be able to leverage existing video codecs, e.g., take HEVC as theexample, to separately compress the geometry and texture information,while at the same time to have one single, self-contained PCC Cat2bitstream, the following aspects should be clearly specified: (1)Extraction/construction a conforming HEVC bitstream for the geometrycomponent out of the PCC Cat2 bitstream; (2) Extraction/construction aconforming HEVC bitstream for the texture component out of the PCC Cat2bitstream; and (3) Signaling/indication of the conformance point, i.e.,profile, tier, and level, of each of the extracted conforming HEVCbitstreams for the geometry and texture component.

To solve the problems described above, and to meet all theabove-mentioned constraints, the present disclosure provides twoalternative sets of methods related to PCC high-level syntax.

In the first set of methods, there is a common high-level syntax for allvideo codecs that can be used for coding of the geometry and texturecomponent of PCC Cat2. This set of methods is summarized as follows.

FIG. 4 illustrates a data structure 400 compatible with PCC. The datastructure 400 may represent a portion of a bitstream generated by anencoder and received by a decoder. As shown, a data unit header 404(which may be referred to as a PCC NAL unit header) is added for eachdata unit 402 (which may be referred to as a PCC NAL unit). While onedata unit 402 and one data unit header 404 are illustrated in the datastructure 400 of FIG. 4, any number of data units 402 and data unitheaders 404 may be included in the data structure 400 in practicalapplications. Indeed, a bitstream including the data structure 400 maycontain a sequence of data units 402 each comprising a data unit header404.

The data unit header 404 may comprise, for example, one or two bytes. Inan embodiment, each data unit 402 is formed as one PCC NAL unit. Thedata unit 402 includes a payload 406. In an embodiment, the data unit406 may also include a supplemental enhancement information (SEI)message, a sequence parameter set, a picture parameter set, sliceinformation, etc.

In an embodiment, the payload 406 of the data unit 402 may be an HEVCunit or an AVC NAL unit. In an embodiment, the payload 406 may containdata for a geometry component or a texture component. In an embodiment,the geometry component is a set of Cartesian coordinates associated witha point cloud frame. In an embodiment, the texture component is a set ofluma sample values of a point cloud frame. When HEVC is in use, the dataunit 402 may be referred to as a PCC NAL unit containing an HEVC NALunit as the payload 406. When AVC is in use, the data unit 402 may bereferred to as a PCC NAL unit containing an AVC NAL unit as the payload406.

In an embodiment, the data unit header 404 (e.g., the PCC NAL unitheader) is designed as summarized below.

First, the data unit header 404 includes a type indicator. The typeindicator may be, for example, 5 bits. The type indicator specifies thetype of content carried in the payload 406. For example, the typeindicator may specify that the payload 406 contains geometry or textureinformation.

In an embodiment, some of the reserved data units (which are similar todata unit 402, but have been reserved for later use) may be used for PCCCat13 data units. Thus, the design of the present disclosure alsoapplies to PCC Cat13. As such, it is possible to unify PCC Cat2 and PCCCat13 into one codec standard specification.

As noted above, the current bitstream format permits emulations of astart code pattern that signals the start of, for example, a new NALunit or PCC NAL unit. The start code pattern may be, for example,0×0003. Because the current bitstream format permits emulations of thestart code pattern, the start code may be unintentionally signaled. Thepresent disclosure provides PCC NAL unit syntax and semantics (seebelow) to resolve this issue. The PCC NAL unit syntax and semanticsdepicted herein ensure start code emulation prevention for each PCC NALunit regardless of its content. Consequently, the last byte of theone-byte or two-byte data unit header 404 (e.g., the data unit headeritself if it is of one byte) is prohibited to be equal to 0×00.

In addition, a group of frames header 408 (a. k. a., a group of framesheader NAL unit) is designed to carry the group of frames headerparameters. In addition, the group of frames header NAL unit includessignaling of other global information such as, for example, the profileand level of each geometry or texture bitstream. In an embodiment, theprofile is a specified subset of the syntax or a subset of coding tools.In an embodiment, the level is a defined set of constraints on thevalues that may be taken by the syntax elements and variables. In anembodiment, the combination of the profile and the level for a bitstreamrepresents a particular decoding capability required for decoding of thebitstream. Furthermore, when profiles and levels are also defined fordecoding of auxiliary information, occupancy map, and the point cloudreconstruction process (which utilizes the decoding results of geometry,texture, auxiliary information, and occupancy map), that profile andlevel are also signaled in the group of frames header 408. In anembodiment, the PCC auxiliary information refers to information likepatch information and point local reconstruction information, which isused for reconstruction of the point cloud signal from a PCC codedbitstream. In an embodiment, the PCC occupancy map refers to informationon which parts of the 3D space are occupied by objects from whichtexture values and other attributes are sampled.

As shown by the syntax below, the constraints on the order of differenttypes of data units 402 (a.k.a., PCC NAL units) are clearly specified.In addition, the start of an access unit 410 (which may contain severalof the data units 402, data unit headers 404, etc.,) is clearlyspecified.

In addition, the process for extraction/construction of each geometry ortexture bitstream is clearly specified in the syntax and/or semanticsnoted below.

In the second set of methods, different overall syntaxes are used fordifferent video codecs. PCC Cat2 using HEVC for coding of geometry andtexture is specified as an amendment to HEVC, while PCC Cat2 using AVCfor coding of geometry and texture is specified as an amendment to AVC.This set of methods is summarized as follows.

For PCC Cat2 using HEVC for coding of geometry and texture, geometry andtexture are considered as three separate layers (e.g., two layers forgeometry, d0 and d1, and one layer for texture). Either SEI messages ornew types of NAL units are used for the occupancy map and the auxiliaryinformation. Two new SEI messages, one for occupancy map and one forauxiliary information, are specified. Another SEI message,sequence-level, is specified to carry the group of frames headerparameters and other global information. This SEI message is similar tothe group of frames header 408 in the first set of methods.

For PCC Cat2 using AVC for coding of geometry and texture, geometry andtexture are considered as three separate layers (e.g., two layers forgeometry, d0 and d1, and one layer for texture). Either SEI messages ornew types of NAL units are used for the occupancy map and the auxiliarypatch information. The extraction of an independently coded non-baselayer and signaling of the conformance point (e.g., profile and level)as a single-layer bitstream are specified. Two new types of SEImessages, one for the occupancy map and one for the auxiliaryinformation, are specified. Another SEI message, sequence-level, isspecified to carry the group of frames header parameters and otherglobal information. This SEI message is similar to the group of framesheader 408 in the first set of methods.

The first set of methods noted above can be implemented based on thedefinitions, abbreviations, syntax, and semantics disclosed below.Aspects that are not specifically mentioned are the same as in thelatest PCC Cat2 WD.

The following definitions apply.

bitstream: A sequence of bits that forms the representation of codedpoint cloud frames and associated data forming one or more CPSs.

byte: A sequence of 8 bits, within which, when written or read as asequence of bit values, the left-most and right-most bits represent themost and least significant bits, respectively.

coded PCC sequence (CPS): A sequence of PCC AUs that comprises, indecoding order, of a PCC Intra Random Access Pictures (IRAP) AU,followed by zero or more PCC AUs that are not PCC IRAP AUs, includingall subsequent PCC AUs up to but not including any subsequent PCC AUthat is a PCC IRAP AU.

decoding order: The order in which syntax elements are processed by thedecoding process.

decoding process: The process specified in this specification (a.k.a.,the PCC Cat2 WD) that reads a bitstream and derives decoded point cloudframes from it.

group of frames header NAL unit: A PCC NAL unit that has PccNalUnitTypeequal to GOF HEADER.

PCC AU: A set of PCC NAL units that are associated with each otheraccording to a specified classification rule, are consecutive indecoding order, and contain all PCC NAL units pertaining to oneparticular presentation time.

PCC IRAP AU: A PCC AU that contains a group of frames header NAL unit.

PCC NAL unit: A syntax structure containing an indication of the type ofdata to follow and bytes containing that data in the form of an RBSPinterspersed as needed with emulation prevention bytes.

raw byte sequence payload (RBSP): A syntax structure containing aninteger number of bytes that is encapsulated in a PCC NAL unit and thatis either empty or has the form of a string of data bits (SODB)containing syntax elements followed by an RBSP stop bit and zero or moresubsequent bits equal to 0.

raw byte sequence payload (RBSP) stop bit: A bit equal to 1 presentwithin a RBSP after a SODB, for which the location of the end within anRBSP can be identified by searching from the end of the RBSP for theRBSP stop bit, which is the last non-zero bit in the RBSP.

SODB: A sequence of some number of bits representing syntax elementspresent within a RBSP prior to the RBSP stop bit, where the left-mostbit is considered to be the first and most significant bit, and theright-most bit is considered to be the last and least significant bit.

syntax element: An element of data represented in the bitstream.

syntax structure: Zero or more syntax elements present together in thebitstream in a specified order.

video AU: An access unit per a particular video codec.

video NAL unit: A PCC NAL unit that has PccNalUnitType equal toGEOMETRY_D0, GEOMETRY_D1, or TEXTURE_NALU.

The following abbreviations apply:

AU Access Unit

CPS Coded PCC Sequence

TRAP Intra Random Access Point

NAL Network Abstraction Layer

PCC Point Cloud Coding

RBSP Raw Byte Sequence Payload

SODB String Of Data Bits

The following provides the syntax, semantics, and sub-bitstreamextraction process. In that regard, the syntax in clause 7.3 of thelatest PCC Cat2 WD is replaced by the following.

The PCC NAL unit syntax is provided. In particular, the general PCC NALunit syntax is as follows.

pcc_nal_unit( NumBytesInNalUnit ) { Descriptor  pcc_nal_unit_header( ) NumBytesInRbsp = 0  for( i = 1; i < NumBytesInNalUnit; i++ )   if( i +2 < NumBytesInNalUnit && next_bits( 24 ) = = 0x000003 ) {    rbsp_byte[NumBytesInRbsp++ ] b(8)    rbsp_byte[ NumBytesInRbsp++ ] b(8)    i += 2   emulation_prevention_three_byte /* equal to 0x03 */ f(8)   } else   rbsp_byte[ NumBytesInRbsp++ ] b(8) }

The PCC NAL unit header syntax is as follows.

pcc_nal_unit_header( ) { Descriptor  forbidden_zero_bit f(1) pcc_nuh_reserved_zero_2bits f(2)  pcc_nal_unit_type_plus1 u(5) }

The raw byte sequence payloads, trailing bits, and byte alignment syntaxis provided.

In particular, the group of frames RBSP syntax is as follows.

group_of_frames_header_rbsp( ) { Descriptor  identified_codec u(8) pcc_profile_level( )  frame_width u(16)  frame_height u(16) occupancy_resolution u(8)  radius_to_smoothing u(8) neighbor_count_smoothing u(8)  radius2_boundary_detection u(8) threshold_smoothing u(8)  lossless_geometry u(8)  lossless_texture u(8) no_attributes u(8)  lossless_geometry_444 u(8)  absolute_d1_coding u(8) binary_arithmetic_coding u(8)  gof_header_extension_flag u(1)  if(gof_header_extension_flag )   while( more_rbsp_data( ) )   gof_header_extension_data_flag u(1)  rbsp_trailing_bits( ) }

The auxiliary information frame RBSP syntax is as follows.

auxiliary_information_frame_rbsp( ) { Descriptor  patch_count u(32) occupancy_precision u(8)  max_candidate_Count u(8)  bit_count_u0 u(8) bit_count_v0 u(8)  bit_count_u1 u(8)  bit_count_v1 u(8)  bit_count_d1u(8)  occupancy_aux_stream_size u(32)  for( i = 0; i < patchCount; i++ ){   patchList[i].patch_u0 ae(v)   patchList[i].patch_v0 ae(v)  patchList[i].patch_u1 ae(v)   patchList[i].patch_v1 ae(v)  patchList[i].patch_d1 ae(v)   patchList[i].delta_size_u0 se(v)  patchList[i].delta_size_v0 se(v)   patchList[i].normal_axis ae(v)  } for( i = 0; i < blockCount; i++ ) {   if( candidatePatches[i].size( )== 1 )    blockToPatch[i] = candidatePatches[i][0]   else {   candidate_index ae(v)    if( candidate_index == max_candidate_count )    blockToPatch[i] = patch_index ae(v)    else     blockToPatch[i] =    candidatePatches[i][candidate_index]   }  }  rbsp_trailing_bits( ) }

The occupancy map frame RBSP syntax is as follows.

occupancy_map_frame_rbsp( ) { Descriptor  for( i = 0; i < blockCount;i++ ) {   if( blockToPatch[i] ) {    is_full ae(v)    if( !is_full ) {    best_traversal_order_index ae(v)     run_count_prefix ae(v)     if (run_count_prefix > 0 )      run_count_suffix ae(v)     occupancy ae(v)    for( j = 0; j <= runCountMinusTwo+1; j++ )      run_length_idx ae( )   }   }  }  rbsp_trailing_bits( ) }

The RBSP trailing bits syntax in clause 7.3.2.11 of the HEVCspecification applies. Likewise, the byte alignment syntax in clause7.3.2.12 of the HEVC specification applies. The PCC profile and levelsyntax are as follows.

pcc_profile_level( ) { Descriptor  pcc_profile_idc u(5) pcc_pl_reserved_zero_19bits u(19)  pcc_level_idc u(8)  if(identified_codec = = CODEC_HEVC ) {   hevc_ptl_12bytes_geometry u(96)  hevc_ptl_12bytes_texture u(96)  }  else if( identified_codec = =CODEC_AVC ) {   avc_pl_3bytes_geometry u(24)   avc_pl_3bytes_textureu(24)  } }

The semantics in clause 7.4 of the latest PCC Cat2 WD is replaced by thefollowing and its sub-clauses.

In general, semantics associated with the syntax structures and with thesyntax elements within these structures are specified in this subclause.When the semantics of a syntax element are specified using a table or aset of tables, any values that are not specified in the table(s) shallnot be present in the bitstream unless otherwise specified.

The PCC NAL unit semantics are discussed. For the general PCC NAL unitsemantics, the general NAL unit semantics in clause 7.4.2.1 of the HEVCspecification apply. The PCC NAL unit header semantics are as follows.

forbidden zero bit shall be equal to 0.

pcc_nuh_reserved_zero_2 bits shall be equal to 0 in bitstreamsconforming to this version of this specification. Other values forpcc_nuh_reserved_zero_2 bits are reserved for future use by ISO/IEC.Decoders shall ignore the value of pcc_nuh_reserved_zero_2 bits.

pcc_nal_unit_type_plus1 minus 1 specifies the value of the variablePccNalUnitType, which specifies the type of RBSP data structurecontained in the PCC NAL unit as specified in Table 1 (see below). Thevariable NalUnitType is specified as follows:

PccNalUnitType=pcc_category2_nal_unit_type_plus 1-1   (7-1)

PCC NAL units that have nal_unit_type in the range of UNSPEC25 ..UNSPEC30, inclusive, for which semantics are not specified, shall notaffect the decoding process specified in this specification.

NOTE 1—PCC NAL unit types in the range of UNSPEC25 . .UNSPEC30 may beused as determined by the application. No decoding process for thesevalues of PccNalUnitType is specified in this specification. Sincedifferent applications might use these PCC NAL unit types for differentpurposes, particular care must be exercised in the design of encodersthat generate PCC NAL units with these PccNalUnitType values, and in thedesign of decoders that interpret the content of PCC NAL units withthese PccNalUnitType values. This specification does not define anymanagement for these values. These PccNalUnitType values might only besuitable for use in contexts in which “collisions” of usage (e.g.,different definitions of the meaning of the PCC NAL unit content for thesame PccNalUnitType value) are unimportant, or not possible, or aremanaged—e.g., defined or managed in the controlling application ortransport specification, or by controlling the environment in whichbitstreams are distributed.

For purposes other than determining the amount of data in the decodingunits of the bitstream, decoders shall ignore (remove from the bitstreamand discard) the contents of all PCC NAL units that use reserved valuesof PccNalUnitType.

NOTE 2—This requirement allows future definition of compatibleextensions to this specification.

TABLE 1 PCC NAL unit type codes Name of PccNalUnitType PccNalUnitTypeContent of PCC NAL unit and/or RBSP syntax structure 0 GOF_HEADER Groupof frames header group_of_frames_header_rbsp( ) 1 AUX_INFO Auxiliaryinformation frame auxiliary_info_frame_rbsp( ) 2 OCP_MAP Occupancy mapframe occupancy_map_frame_rbsp( ) 3 GEOMETRY_D0 The payload of this PCCNAL unit contains a NAL unit of the geometry d0 component per theidentified video codec. 4 GEOMETRY_D1 The payload of this PCC NAL unitcontains a NAL unit of the geometry d1 component per the identifiedvideo codec. 5 TEXTURE_NALU The payload of this PCC NAL unit contains aNAL unit of the texture component per the identified video codec. 6 . .. 24 RSV_6 . . . RSV_24 Reserved 25 . . . 30 UNSPEC25 . . . UnspecifiedUNSPEC30

NOTE 3—The identified video codec (e.g., HEVC or AVC) is indicated inthe group of frames header NAL unit that is present in the first PCC AUof each CPS.

Encapsulation of an SODB within an RBSP (informative) is provided. Inthat regard, clause 7.4.2.3 of the HEVC specification applies.

The order of PCC NAL units and association to AUs and CPSs is provided.In general, this clause specifies constraints on the order of PCC NALunits in the bitstream.

Any order of PCC NAL units in the bitstream obeying these constraints isreferred to in the text as the decoding order of PCC NAL units. Within aPCC NAL unit that is not a video NAL unit, the syntax in clause 7.3specifies the decoding order of syntax elements. Within a video NALunit, the syntax specified in the specification of the identified videocodec specifies the decoding order of syntax elements. Decoders arecapable of receiving PCC NAL units and their syntax elements in decodingorder.

The order of PCC NAL units and their association to PCC AUs is provided.

This clause specifies the order of PCC NAL units and their associationto PCC AUs.

A PCC AU comprises of zero or one group of frames header NAL unit, onegeometry dO video AU, one geometry dl video AU, one auxiliaryinformation frame NAL unit, one occupancy map frame NAL unit, and onetexture video AU, in the order listed.

Association of NAL units to a video AU and the order of NAL units withina video AU are specified in the specification of the identified videocodec, e.g., HEVC or AVC. The identified video codec is indicated in theframes header NAL unit that is present in the first PCC AU of each CPS.

The first PCC AU of each CPS starts with a group of frames header NALunit, and each group of frames header NAL unit specifies the start of anew PCC AU.

Other PCC AUs start with the PCC NAL unit that contains the first NALunit of a geometry d0 video AU. In other words, the PCC NAL unit thatcontains the first NAL unit of a geometry d0 video AU, when not precededby a group of frames header NAL unit, starts a new PCC AU.

The order of PCC AUs and their association to CPSs is provided.

A bitstream conforming to this specification consists of one or moreCPSs.

A CPS comprises consists of one or more PCC AUs. The order of PCC NALunits and their association to PCC AUs is described in clause 7.4.2.4.2.

The first PCC AU of a CPS is a PCC IRAP AU.

The raw byte sequence payloads, trailing bits, and byte alignmentsemantics are provided. The group of frames header RBSP semantics are asfollows.

identified_codec specifies the identified video codec used for coding ofthe geometry and texture components as shown in Table 2.

TABLE 2 Specification of identified_codec identi- fied_codec Name ofidentified_codec The identified video codec 0 CODEC_HEVC ISO/IEC IS23008-2 (HEVC) 1 CODEC_AVC ISO/IEC IS 14496-10 (AVC) 2 . . . 63CODEC_RSV_2 . . . Reserved CODEC_RSV_63

frame_width indicates the frame width, in pixels, of the geometry andtexture videos. It shall be multiple of occupancyResolution.

frame_height indicates the frame height, in pixels, of the geometry andtexture videos. It shall be multiple of occupancyResolution.

occupancy_resolution indicates the horizontal and vertical resolution,in pixels, at which patches are packed in the geometry and texturevideos. It shall be an even value multiple of occupancyPrecision.

radius_to_smoothing indicates the radius to detect neighbours forsmoothing. The value of radius_to_smoothing shall be in the range of 0to 255, inclusive.

neighbor_count_smoothing indicates the maximum number of neighbors usedfor smoothing. The value of neighbor_count_smoothing shall be in therange of 0 to 255, inclusive.

radius2_boundary_detection indicates the radius for boundary pointdetection. The value of radius2_boundary_detection shall be in the rangeof 0 to 255, inclusive.

threshold_smoothing indicates the smoothing threshold. The value ofthreshold_smoothing shall be in the range of 0 to 255, inclusive.

lossless_geometry indicates lossless geometry coding. The value oflossless_geometry equal to 1 indicates that point cloud geometryinformation is coded losslessly. The value of lossless_geometry equal to0 indicates that point cloud geometry information is coded in a lossymanner.

lossless_texture indicates lossless texture encoding. The value oflossless_texture equal to 1 indicates that point cloud textureinformation is coded losslessly. The value of lossless_texture equal to0 indicates that point cloud texture information is coded in a lossymanner.

no_attributes indicates whether to attributes are coded along withgeometry data. The value of no_attributes equal to 1 indicates that thecoded point cloud bitstream does not contain any attributes information.The value of no_attributes equal to 0 indicates that the coded pointcloud bitstream contains attributes information.

lossless_geometry_444 indicates whether to use 4:2:0 or 4:4:4 videoformat for geometry frames. The value of lossless_geometry_444 equal to1 indicates that the geometry video is coded in 4:4:4 format. The valueof lossless_geometry_444 equal to 0 indicates that the geometry video iscoded in 4:2:0 format.

absolute_d1_coding indicates how the geometry layers other than thelayer nearest to the projection plane are coded. absolute_d1_codingequal to 1 indicates that the actual geometry values are coded for thegeometry layers other than the layer nearest to the projection plane.absolute_d1_coding equal to 0 indicates that the geometry layers otherthan the layer nearest to the projection plane are coded differentially.

bin_arithmetic_coding indicates whether binary arithmetic coding isused. The value of bin_arithmetic_coding equal to 1 indicates thatbinary arithmetic coding is used for all the syntax elements. The valueof bin_arithmetic_coding equal to 0 indicates that non-binary arithmeticcoding is used for some syntax elements.

gof_header_extension_flag equal to 0 specifies that nogof_header_extension_data_flag syntax elements are present in the groupof frames header RBSP syntax structure. gof_header_extension_flag equalto 1 specifies that there are gof_header_extension_data_flag syntaxelements present in the group of frames header RBSP syntax structure.Decoders shall ignore all data that follow the value 1 forgof_header_extension_flag in a group of frames header NAL unit.

gof_header_extension_data_flag may have any value. Its presence andvalue do not affect decoder conformance. Decoders shall ignore allgof_header_extension_data_flag syntax elements.

The auxiliary information frame RBSP semantics are provided.

patch_count is the number of patches in the geometry and texture videos.It shall be larger than 0.

occupancy_precision is the horizontal and vertical resolution, inpixels, of the occupancy map precision. This corresponds to thesub-block size for which occupancy is signaled. To achieve losslesscoding of occupancy map this should be set to size 1.

max_candidate_count specifies the maximum number of candidates in thepatch candidate list.

bit_count_u0 specifies the number of bits for fixed-length coding ofpatch_u0.

bit_count_v0 specifies the number of bits for fixed-length coding ofpatch_v0.

bit_count_u1 specifies the number of bits for fixed-length coding ofpatch_u1.

bit_count_v1 specifies the number of bits for fixed-length coding ofpatch_v1.

bit_count_d1 specifies the number of bits for fixed-length coding ofpatch_d1.

occupancy_aux_stream_size is the number of bytes used for coding patchinformation and occupancy map.

The following syntax elements are specified once per patch.

patch_u0 specifies the x-coordinate of the top-left corner sub-block ofsize occupancy_resolution x occupancy_resolution of the patch boundingbox. The value of patch_u0 shall be in the range of 0 toframe_width/occupancy_resolution −1, inclusive.

patch_v0 specifies the y-coordinate of the top-left corner sub-block ofsize occupancy_resolution x occupancy_resolution of the patch boundingbox. The value of patch_v0 shall be in the range of 0 toframe_height/occupancy_resolution −1, inclusive.

patch_u1 specifies the minimum x-coordinate of the 3D bounding box ofpatch points. The value of patch_ul shall be in the range of 0 toframe_width −1, inclusive.

patch v1 is the minimum y-coordinate of the 3D bounding box of patchpoints. The value of patch v1 shall be in the range of 0 to frameHeight−1, inclusive.

patch d1 specifies the minimum depth of the patch. The value of patch_d1shall be in the range of 0 to <255?>, inclusive.

delta_size_u0 is the difference of patch width between the current patchand the previous one. The value of deltasize_u0 shall be in the range of<.−65536.?> to <65535?>, inclusive.

delta_size_v0 is the difference of patch height between the currentpatch and the previous one. The value of deltasize_v0 shall be in therange of <−65536?.> to <.65535?.>, inclusive.

normal_axis specifies the plane projection index. The value ofnormal_axis shall be in the range of 0 to 2, inclusive. normalAxisvalues of 0, 1, and 2 correspond to the X, Y, and Z projection axis,respectively.

The following syntax elements are specified once per block.

candidate_index is the index into the patch candidate list. The value ofcandidate_index shall be in the range of 0 to max_candidate_count,inclusive.

patch_index is an index to a sorted patch list, in descending sizeorder, associated with a frame.

The group of frames occupancy map semantics is provided.

The following syntax elements are provided for non-empty blocks.

is_full specifies whether the current occupancy block of sizeoccupancy_resolution×occupancy_resolution block is full. is_full equalto 1 specifies that the current block is full. is_full equal to 0specifies that the current occupancy block is not full.

best_traversal_order_index specifies the scan order for sub-blocks ofsize occupancy_precision×occupancy_precision in the currentoccupancy_resolution×occupancy_resolution block. The value ofbest_traversal_order_index shall be in the range of 0 to 4, inclusive.

run_count_prefix is used in the derivation of variable runCountMinusTwo.

run_count_suffix is used in the derivation of variable runCountMinusTwo.When not present, the value of run_count_suffix is inferred to be equalto 0.

When the value of blockToPatch for a particular block is not equal tozero and the block is not full, runCountMinusTwo plus 2 represents thenumber of signalled runs for a block. The value of runCountMinusTwoshall be in the range of 0 to (occupancy_resolution *occupancy_resolution) −1, inclusive.

runCountMinusTwo is derived as follows:

runCountMinusTwo=(1 <<run_count_prefix) −1 +run_count_suffix   (7-85)

occupancy specifies the occupancy value for the first sub-block (ofoccupancyPrecision×occupancyPrecision pixels). occupancy equal to 0specifies that the first sub-block is empty. occupancy equal to 1specifies that the first sub-block is occupied.

run_length_idx is indication of the run length. The value ofrunLengthldx shall be in the range of 0 to 14, inclusive.

The variable runLength is derived from run_length_idx by using Table 3.

TABLE 3 Derivation of runLength from run_length_idx run_length_idxrunLength run_length_idx runLength 0 0 8 13 1 1 9 9 2 2 10 6 3 3 11 10 47 12 12 5 11 13 4 6 14 14 8 7 5

NOTE—The occupancy map is shared by both geometry and texture video.

The RBSP trailing bits semantics in clause 7.4.3.11 of the HEVCspecification apply. The byte alignment semantics in clause 7.4.3.12 ofthe HEVC specification also apply. The PCC profile and level semanticsare as follows.

pcc_profile_idc indicates a profile to which the CPS conforms asspecified in Annex A. Bitstreams shall not contain values ofpcc_profile_idc other than those specified in Annex A. Other values ofpcc_profile_idc are reserved for future use by ISO/IEC.

pcc_pl_reserved_zero_19 bits shall be equal to 0 in bitstreamsconforming to this version of this specification. Other values forpcc_pl_reserved_zero_19bits are reserved for future use by ISO/IEC.Decoders shall ignore the value of pcc_pl_reserved_zero_19bits.

pcc_level_idc indicates a level to which the CPS conforms as specifiedin Annex A. Bitstreams shall not contain values of pcc_level_idc otherthan those specified in Annex A. Other values of pcc_level_idc arereserved for future use by ISO/IEC.

hevc_ptl_12 bytes_geometry shall be equal to the value of the 12 bytesfrom general_profile_idc to general_level_idc, inclusive, in the activeSPS when a geometry HEVC bitstream extracted as specified in clause 10is decoded by a conforming HEVC decoder.

hevc_ptl_12 bytes_texture shall be equal to the value of the 12 bytesfrom general_profile_idc to general_level_idc, inclusive, in the activeSPS when a texture HEVC bitstream extracted as specified in clause 10 isdecoded by a conforming HEVC decoder.

avc_pl_3 ytes_geometry shall be equal to the value of the 3 bytes fromprofile_idc to level_idc, inclusive, in the active SPS when a geometryAVC bitstream extracted as specified in clause 10 is decoded by aconforming AVC decoder.

avc_pl_3 ytes_texture shall be equal to the value of the 3 bytes fromprofile_idc to level_idc, inclusive, in the active SPS when a textureAVC bitstream extracted as specified in clause 10 is decoded by aconforming AVC decoder.

The sub-bitstream extraction process in clause 104 of the latest PCCCat2 WD is replaced by the following. For the sub-bitstream extractionprocess, inputs are a bitstream, a target video component indication ofgeometry d0, geometry d1, or texture component. The output of thisprocess is a sub-bitstream.

In an embodiment, it is a requirement of bitstream conformance for theinput bitstream that any output sub-bitstream that is the output of theprocess specified in this clause with a conforming PCC bitsteam and anyvalue of the target video component indication shall be a conformingvideo bitstream per the identified video codec.

The output sub-bitstream is derived by the following ordered steps.

Depending on the value of the target video component indication, thefollowing applies.

If geometry d0 component is indicated, remove all PCC NAL units withPccNalUnitType not equal to GEOMETRY_D0.

Otherwise, if geometry d1 component is indicated, remove all PCC NALunits with PccNalUnitType not equal to GEOMETRY D1.

Otherwise (texture component is indicated), remove all PCC NAL unitswith PccNalUnitType not equal to TEXTURE_NALU.

For each PCC NAL unit, remove the first byte.

Another embodiment is provided below.

In another embodiment of the first set of methods as summarized above,the PCC NAL unit header (e.g., the data unit header 404 in FIG. 4) isdesigned such that the codec used for coding of the geometry and texturecomponents can be inferred from the PCC NAL unit type. For example, thePCC NAL unit header is designed as summarized below:

In the PCC NAL unit header, there is a type indicator, e.g., 7 bits,that specifies the type of content carried in the PCC NAL unit payload.The type is determined, for example, according to the following:

0: The payload contains an HEVC NAL unit

1: The payload contains an AVC NAL unit

2.63: Reserved

64: Group of frames header NAL unit

65: Auxiliary information NAL unit

66: Occupancy map NAL unit

67.126: Reserved

PCC NAL units with PCC NAL unit type in the range of 0 to 63, inclusive,are referred to as video NAL units.

It is possible to use some of the reserved PCC NAL unit types for PCCCat13 data units, thus unifying PCC Cat2 and PCC Cat13 into one standardspecification.

FIG. 5 is an embodiment of method 500 of point cloud coding implementedby a video decoder (e.g., video decoder 30). The method 500 may beperformed to solve one or more of the aforementioned problems associatedwith point cloud coding.

In block 502, an encoded bitstream (e.g., the data structure 400)including a data unit header (e.g., the data unit header 404) and a dataunit (e.g., the data unit 402) is received. The data unit headercontains a type indicator specifying a type of content carried in apayload (e.g., the payload 406) of the data unit.

In block 504, the encoded bitstream is decoded. The decoded bitstreammay be utilized to generate an image or video for display to a user on adisplay device.

In an embodiment, the data unit header is a PCC network abstractionlayer (NAL) unit header. In an embodiment, the data unit is a PCC NALunit. In an embodiment, the indicator specifies that the type of contentis a geometry component. In an embodiment, the indicator specifies thatthe type of content is a texture component. In an embodiment, theindicator specifies that the type of content is a geometry component ora texture component.

In an embodiment, the indicator specifies that the type of content isauxiliary information. In an embodiment, the indicator specifies thatthe type of content is an occupancy map.

In an embodiment, the payload comprises a High Efficiency Video Coding(HEVC) NAL unit. In an embodiment, the payload comprises an AdvancedVideo Coding (AVC) NAL unit. In an embodiment, the type indicatorcomprises five bits or seven bits.

FIG. 6 is an embodiment of method 600 of point cloud coding implementedby a video encoder (e.g., video encoder 20). The method 600 may beperformed to solve one or more of the aforementioned problems associatedwith point cloud coding.

In block 602, an encoded bitstream (e.g., the data structure 400)including a data unit header (e.g., the data unit header 404) and a dataunit (e.g., the data unit 402) is generated. The data unit headercontains a type indicator specifying a type of content carried in apayload (e.g., the payload 406) of the data unit.

In block 604, the encoded bitstream is transmitted toward a decoder(e.g., video decoder 30). Once received by the decoder, the encodedbitstream may be decoded to generate an image or video for display to auser on a display device.

In an embodiment, the data unit header is a PCC network abstractionlayer (NAL) unit header. In an embodiment, the data unit is a PCC NALunit. In an embodiment, the indicator specifies that the type of contentis a geometry component. In an embodiment, the indicator specifies thatthe type of content is a texture component. In an embodiment, theindicator specifies that the type of content is a geometry component ora texture component.

In an embodiment, the indicator specifies that the type of content isauxiliary information. In an embodiment, the indicator specifies thatthe type of content is an occupancy map.

In an embodiment, the payload comprises a High Efficiency Video Coding(HEVC) NAL unit. In an embodiment, the payload comprises an AdvancedVideo Coding (AVC) NAL unit. In an embodiment, the type indicatorcomprises five bits or seven bits.

FIG. 7 is a schematic diagram of a video coding device 700 (e.g., avideo coder 20, a video decoder 30, etc.) according to an embodiment ofthe disclosure. The video coding device 700 is suitable for implementingthe methods and processes disclosed herein. The video coding device 700comprises ingress ports 710 and receiver units (Rx) 720 for receivingdata; a processor, logic unit, or central processing unit (CPU) 730 toprocess the data; transmitter units (Tx) 740 and egress ports 750 fortransmitting the data; and a memory 760 for storing the data. The videocoding device 700 may also comprise optical-to-electrical (OE)components and electrical-to-optical (EO) components coupled to theingress ports 710, the receiver units 720, the transmitter units 740,and the egress ports 750 for egress or ingress of optical or electricalsignals.

The processor 730 is implemented by hardware and software. The processor730 may be implemented as one or more CPU chips, cores (e.g., as amulti-core processor), field-programmable gate arrays (FPGAs),application specific integrated circuits (ASICs), and digital signalprocessors (DSPs). The processor 730 is in communication with theingress ports 710, receiver units 720, transmitter units 740, egressports 750, and memory 760. The processor 730 comprises a coding module770. The coding module 770 implements the disclosed embodimentsdescribed above. The inclusion of the coding module 770 thereforeprovides a substantial improvement to the functionality of the codingdevice 700 and effects a transformation of the video coding device 700to a different state. Alternatively, the coding module 770 isimplemented as instructions stored in the memory 760 and executed by theprocessor 730.

The video coding device 700 may also include input and/or output (I/O)devices 780 for communicating data to and from a user. The I/O devices780 may include output devices such as a display for displaying videodata, speakers for outputting audio data, etc. The I/O devices 780 mayalso include input devices, such as a keyboard, mouse, trackball, etc.,and/or corresponding interfaces for interacting with such outputdevices.

The memory 760 comprises one or more disks, tape drives, and solid-statedrives and may be used as an over-flow data storage device, to storeprograms when such programs are selected for execution, and to storeinstructions and data that are read during program execution. The memory760 may be volatile and non-volatile and may be read-only memory (ROM),random-access memory (RAM), ternary content-addressable memory (TCAM),and static random-access memory (SRAM).

FIG. 8 is a schematic diagram of an embodiment of a means for coding800. In embodiment, the means for coding 800 is implemented in a videocoding device 802 (e.g., a video encoder 20 or a video decoder 30). Thevideo coding device 802 includes receiving means 801. The receivingmeans 801 is configured to receive a picture to encode or to receive abitstream to decode. The video coding device 802 includes transmissionmeans 807 coupled to the receiving means 801. The transmission means 807is configured to transmit the bitstream to a decoder or to transmit adecoded image to a display means (e.g., one of the I/O devices 780).

The video coding device 802 includes a storage means 803. The storagemeans 803 is coupled to at least one of the receiving means 801 or thetransmission means 807. The storage means 803 is configured to storeinstructions. The video coding device 802 also includes processing means805. The processing means 805 is coupled to the storage means 803. Theprocessing means 805 is configured to execute the instructions stored inthe storage means 803 to perform the methods disclosed herein.

While several embodiments have been provided in the present disclosure,it may be understood that the disclosed systems and methods might beembodied in many other specific forms without departing from the spiritor scope of the present disclosure. The present examples are to beconsidered as illustrative and not restrictive, and the intention is notto be limited to the details given herein. For example, the variouselements or components may be combined or integrated in another systemor certain features may be omitted, or not implemented.

In addition, techniques, systems, subsystems, and methods described andillustrated in the various embodiments as discrete or separate may becombined or integrated with other systems, components, techniques, ormethods without departing from the scope of the present disclosure.Other examples of changes, substitutions, and alterations areascertainable by one skilled in the art and may be made withoutdeparting from the spirit and scope disclosed herein.

What is claimed is:
 1. A method of point cloud coding (PCC) implementedby a video decoder, comprising: receiving an encoded bitstream includinga unit header, the unit header containing a type indicator specifying atype of content carried in a payload; and decoding the encodedbitstream.
 2. The method of claim 1, wherein the unit header is a PCCnetwork abstraction layer (NAL) unit header, wherein the encodedbitstream further comprises a data unit, and wherein the data unit is aPCC network abstraction layer (NAL) unit.
 3. The method of claim 1,wherein the type indicator specifies that the type of content is ageometry component or a texture component.
 4. The method of claim 1,wherein the type indicator specifies that the type of content isauxiliary information or an occupancy map.
 5. The method of claim 1,wherein the payload comprises a High Efficiency Video Coding (HEVC)network abstraction layer (NAL) unit or an Advanced Video Coding (AVC)network abstraction layer (NAL) unit.
 6. The method of claim 1, whereinthe type indicator consists of five bits.
 7. The method of claim 1,wherein the type indicator specifies that the type of content is ageometry component, and wherein the geometry component comprises a setof coordinates associated with a point cloud frame, and wherein the setof coordinates are Cartesian coordinates.
 8. The method of claim 1,wherein the type indicator specifies that the type of content is atexture component, and wherein the texture component comprises a set ofluma sample values of a point cloud frame.
 9. A method of point cloudcoding (PCC) implemented by a video encoder, comprising: generating anencoded bitstream including a unit header, the unit header containing atype indicator specifying a type of content carried in a payload; andtransmitting the encoded bitstream toward a decoder.
 10. The method ofclaim 9, wherein the unit header is a PCC network abstraction layer(NAL) unit header, wherein the encoded bitstream further comprises adata unit, and wherein the data unit is a PCC network abstraction layer(NAL) unit.
 11. The method of claim 9, wherein the type indicatorspecifies that the type of content is auxiliary information or anoccupancy map.
 12. The method of claim 9, wherein the payload comprisesa High Efficiency Video Coding (HEVC) network abstraction layer (NAL)unit or an Advanced Video Coding (AVC) network abstraction layer (NAL)unit.
 13. The method of claim 9, wherein the type indicator specifiesthat the type of content is a geometry component, and wherein thegeometry component comprises a set of coordinates associated with apoint cloud frame, and wherein the set of coordinates are Cartesiancoordinates.
 14. The method of claim 9, wherein the type indicatorspecifies that the type of content is a texture component, and whereinthe texture component comprises a set of luma sample values of a pointcloud frame.
 15. A coding apparatus, comprising: a receiver configuredto receive a bitstream including a unit header, the unit headercontaining a type indicator specifying a type of content carried in apayload; a memory storing instructions; a processor coupled to thememory, the processor configured to execute the instructions to causethe coding apparatus to decode the bitstream; and a display configuredto display an image obtained from the bitstream as decoded.
 16. Thecoding apparatus of claim 15, wherein the unit header is a point cloudcoding (PCC) network abstraction layer (NAL) unit header, wherein theencoded bitstream further comprises a data unit, and wherein the dataunit is a PCC network abstraction layer (NAL) unit.
 17. The codingapparatus of claim 15, wherein the type indicator specifies that thetype of content is a geometry component, a texture component, auxiliaryinformation, or an occupancy map.
 18. The coding apparatus of claim 15,wherein the payload comprises a High Efficiency Video Coding (HEVC)network abstraction layer (NAL) unit or an Advanced Video Coding (AVC)network abstraction layer (NAL) unit.
 19. The coding apparatus of claim15, wherein the type indicator specifies that the type of content is ageometry component, and wherein the geometry component comprises a setof coordinates associated with a point cloud frame, and wherein the setof coordinates are Cartesian coordinates.
 20. The coding apparatus ofclaim 15, wherein the type indicator specifies that the type of contentis a texture component, and wherein the texture component comprises aset of luma sample values of a point cloud frame.